Discrete-Mixture HMMs-based Approach for Noisy Speech Recognition
نویسندگان
چکیده
It is well known that the application of hidden Markov models (HMMs) has led to a dramatic increase of the performance of automatic speech recognition in the 1980s and from that time onwards. In particular, large vocabulary continuous speech recognition (LVCSR) could be realized by using a recognition unit such as phones. A variety of speech characteristics can be modelled by using HMMs effectively. The HMM represents the transition of statistical characteristics by using the state sequence of a Markov chain. Each state of the chain is composed by either a discrete output probability or a continuous output probability distribution. In 1980s, discrete HMM was mainly used as an acoustic model of speech recognition. The SPHINX speech recognition system was developed by K.-F. Lee in the late 1980s (Lee & Hon, 1988). The system was a speaker-independent, continuous speech recognition system based on discrete HMMs. It was evaluated on the 997-word resource management task and obtained a word accuracy of 93% with a bigram language model. After that, comparative investigation between discrete HMM and continuous HMM had been made and then it was concluded that the performance of continuous-mixture HMM overcame that of discrete HMM. Then almost all of recent speech recognition systems use continuous-mixture HMMs (CHMMs) as acoustic models. The parameters of CHMMs can be estimated efficiently under assumption of normal distribution. Meanwhile, the discrete Hidden Markov Models (DHMMs) based on vector quantization (VQ) have a problem that they are effected by quantization distortion. However, CHMMs may unfit to recognize noisy speech because of false assumption of normal distribution. The DHMMs can represent more complicated shapes and they are expected to be useful for noisy speech. This chapter introduces new methods of noise robust speech recognition using discretemixture HMMs (DMHMMs) based on maximum a posteriori (MAP) estimation. The aim of this work is to develop robust speech recognition for adverse conditions which contain both stationary and non-stationary noise. Especially, we focus on the issue of impulsive noise which is a major problem in practical speech recognition system. DMHMM is one type of DHMM frameworks. The method of DMHMM was originally proposed to reduce computation costs in decoding process (Takahashi et al., 1997).
منابع مشابه
Noisy Speech Recognition with Discrete-Mixture HMMs Based on MAP Estimation
In this paper, we develop a novel modeling scheme for discrete-mixture HMMs (DMHMMs) by using maximum a posteriori (MAP) estimation. Also the MAP estimated DMHMMs are used for speech recognition to improve the accuracy under noisy conditions. The DMHMMs were originally proposed to reduce calculation costs in decoding process [1][2]. We propose a new method for MAP estimation of DMHMM parameters...
متن کاملNoisy speech recognition by using output combination of discrete-mixture HMMs and continuous-mixture HMMs
This paper presents an output combination approach for noiserobust speech recognition. The aim of this work is to improve recognition performance for adverse conditions which contain both stationary and non-stationary noise. In the proposed method, both discrete-mixture HMMs (DMHMMs) and continuous-mixture HMMs (CMHMMs) are used as acoustic models. In the DMHMM, subvector quantization is used i...
متن کاملIVN-Based Joint Training Of GMM And HMMs Using An Improved VTS-Based Feature Compensation For Noisy Speech Recognition
In our previous work, we proposed a feature compensation approach using high-order vector Taylor series approximation for noisy speech recognition. In this paper, first we improve the feature compensation in both efficiency and accuracy by boosted mixture learning of GMM, applying higher order information of VTS approximation only to the noisy speech mean parameters, acoustic context expansion,...
متن کاملUsing deep neural networks to improve proficiency assessment for children English language learners
We investigated the use of context-dependent deep neural network hidden Markov models, or CD-DNN-HMMs, to improve speech recognition performance for a better assessment of children English language learners (ELLs). The ELL data used in the present study was obtained from a large language assessment project administered in schools in a U.S. state. Our DNN-based speech recognition system, built u...
متن کاملSpeech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions
Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...
متن کامل